Improving speech synthesis quality by reducing pitch peaks in the source recordings
نویسندگان
چکیده
We present a method for improving the perceived naturalness of corpus-based speech synthesizers. It consists in removing pronounced pitch peaks in the original recordings, which typically lead to noticeable discontinuities in the synthesized speech. We perceptually evaluated this method using two concatenative and two HMM-based synthesis systems, and found that using it on the source recordings managed to improve the naturalness of the synthesizers and had no effect on their intelligibility.
منابع مشابه
Maximum-likelihood dynamic intonation model for concatenative text-to-speech system
In this work we present a Maximum Likelihood (ML) joint pitch curve modeling, inspired by HMM TTS synthesis concept. This model provides an optimal solution for the coarse target intonation curve (3 points per syllable) and incorporates both static and dynamic pitch values for better utterance intonation modeling. The coarse intonation curve may be optionally combined with the original pitch ex...
متن کاملQuality improvement of PSOLA analysis-synthesis using partial zero-phase conversion
This paper discusses two issues of the quality improvement of F0 modified speech based upon PSOLA analysissynthesis. Previous studies[1][2] pointed out that the location of a window of PSOLA influences the quality of synthesized speech and one of them claimed that the center of a window should be located at a pitch pulse in source waveforms. However, pitch pulse detection sometimes fails due to...
متن کاملOn Reducing the Buzz in Lpc Synthesis
A method for reducing the characteristic buzz from LPC synthetic speech is presented. The method consists of the use of an non-impulse source for exciting the LPC synthesizer during voiced sounds. One novel feature is that the temporal parameters of the source are kept in fixed proportion to. the pitch period. An extensive perceptual experiment has shown that the resulting quality of the synthe...
متن کاملAn Algorithm for Locating Fundamental Frequency (f0) Markers in Speech
AN ALGORITHM FOR LOCATING FUNDAMENTAL FREQUENCY (F0) MARKERS IN SPEECH Princy Dikshit Old Dominion University, December 2004 Director: Dr. Stephen A. Zahorian Speech has been the principal form of human communication since it began to evolve at least one hundred thousand years ago. Speech is produced by vibrations of the vocal cords. The rate of vibration of the cords is called fundamental freq...
متن کاملEnhancement of electrolaryngeal speech by spectral subtraction, spectral compensation, and introduction of jitter and shimmer
An electrolarynx, a verbal communication aid used by laryngectomy patients, is a vibrator held against the neck tissue to provide excitation to the vocal tract, as a substitute to that provided by the glottal vibrations. Although the user can set the vibration level and pitch, a dynamic control of level, voicing, and pitch during speech production is not feasible. In addition to this basic limi...
متن کامل